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Abstract 

We present a novel approach to parameter estimation of systems with 
complicated dynamics, as well as evidence for the existence of a uni- 
versal power law that enables us to quantify the dependence of global 
geometry on small changes in the parameters of the system. This power 
law gives rise to what seems to be a new dynamical system invariant. 
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1 Introduction 

"...Since all our measurements and obser- 
vations are nothing more than approximations 
to the truth, the same must be true of all calcu- 
lations resting upon them, and the highest aim 
of all computations made concerning concrete 
phenomena must be to approximate, as nearly 
as practicable, to the truth." 

K. F. Gauss, "Theoria Motus Corporum 
Coelestium" (1809). 

Estimation theory deals with the problem of estimat- 
ing the state of a stochastic dynamical system from 
noisy observations. The earliest stimulus for its develop- 
ment was apparently provided by astronomical studies of 
planet and comet motion in the 18 t/l century. The mo- 
tion of these bodies can be completely characterized by a 
finite number of parameters and the estimation problem 
that was considered was that of inferring the values of 
these parameters from telescopic measurement data. 

To be more precise, suppose m measurement vector 
quantities yi , . . . , y m G M' are available at discrete in- 
stants of time t\, . . . ,t m . The parameter vector rel" 
which we wish to determine is assumed to be linearly 
related to the measured data, i.e. 

(1) Vk = M k x + v k 

where v k represent the measurement errors that occur 
at each observation time. Let x m denote the the esti- 
mate of x based on the m data samples {y\, . . . ,y m }. 
Then the residual r k (1 < k < m) associated with the 
k th measurement is defined to be the difference between 
the observed value y k and the value predicted from the 
estimate £ m : 



(2) 



r k = y k - M k x t 



Around 1795 Karl Friedrich Gauss invented the rev- 
olutionary method of least-squares for attacking the 
above problem. The method was independently invented 
and published by Legendre in 1806 in his book "Nou- 
velles methodes pour la determination des orbites des 
cometes". A detailed description of the method was pub- 
lished by Gauss in 1809 in his book "Theoria Motus Cor- 
porum Coelestium". The name "least-squares method" 
comes from the fact that the optimal estimate x m for x 
based on the observations Y m = {t/;}i<;< m is the value 
of x which minimizes an appropriately weighted sum of 
the squares of the residuals 



(3) 



L m - ^2rlW k r k 



k=i 



where the elements of the matrices W k are selected to 
indicate the degree of confidence that one can place on 
the individual measurements. 

We notice that the state/parameter vector x is as- 
sumed to be fixed in the above description. Moreover, 
the least-squares approach has no probabilistic meaning. 



It should be noted that Gauss also considered a proba- 
bilistic approach to the estimation problem, but rejected in 
favor of minimizing function 3. 



The general shift of attention to dynamical systems in 
the 20 t/l century, as well as the introduction of the max- 
imum likelihood method by R. A. Fisher in 1912 led to 
new developments in the field of estimation theory. Kol- 
mogorov in 1941 and Wiener in 1942 independently de- 
veloped a linear minimum mean square estimation tech- 
nique that received considerable attention and laid the 
foundations for the development of Kalman filter theory. 
This approach allowed for systems with state changing 
with time, as well as both continuous and discrete obser- 
vations. Kolmogorov'sand Wiener's work focuses on the 
analysis and synthesis of systems in terms of their input- 
output characteristics, reflecting the general trend in the 
scientific community at that time. The problems were 
formulated in terms of integral equations and the main 
tools used were the Laplace and Fourier transforms. 

Subsequent scientific developments have stressed the 
state space approach, which uses difference and differen- 
tial rather than integral equations for describing a sys- 
tem. Although both these approaches are mathemati- 
cally equivalent the latter proved much more convenient 
and useful, opening the door to many new developments. 

The state space reformulation of the estimation prob- 
lem suggested a recursive approach, first attempted for a 
specific system in 1955, by J. W. Follin at Johns Hopkins 
University. Five years later, R. E. Kalman published a 
very influential paper ([20]), in which he described an 
optimal recursive algorithm for solving the linear esti- 
mation problem using a general state space approach. 
This became known as the Kalman filter. In a sense, the 
Kalman filter is nothing but an efficient computational 
solution of Gauss's least-squares problem in a more gen- 
eral state space setting. 

The value of Kalman 's reformulation was that it led to 
significant new insights and had the effect of unifying all 
previous results. Moreover, the Kalman filter equations 
provided an extremely convenient procedure for digital 
computer implementation, and its recursive structure, 
together with the fact that the Kalman gain is inde- 
pendent of the observations and can be precomputed, 
opened the door for real time estimation. 

Nature is inherently nonlinear. Hence, in order to ap- 
ply estimation techniques to real physical systems, it was 
necessary to develop algorithms that can deal with non- 
linear dynamical systems. There are many reasons why 
nonlinear filtering theory is much harder than linear the- 
ory. The main difference is that the linear filter has es- 
sentially a finite description: the state of the system con- 
sists of the mean and the covariance matrix, which are 
enough to completely determine the conditional proba- 
bility density that contains all relevant information. In 
the nonlinear case, the state of the filter is infinite, con- 
sisting of the whole conditional density function used to 
compute oprimal estimates. The numerical problems in- 
volved in computing the whole probability density func- 
tion are, in general, intractable, since they involve the 
solution of complicated integro-differential equations or 
functional integral equations. 

In practice, finite approximations to the conditional 
probability density are considered. An approximate non- 
linear filter is obtained by parametrizing the conditional 



density via a finite set of parameters, and computing 
equations for the evolution of these parameters, which 
comprise the state of the system. One of the most popu- 
lar approaches used in practice, is to linearize the system 
in question and apply linear filtering theory. This gives 
rise to the Extended Kalman Filter and its variants. The 
main characteristic of all these approaches is that they 
try to follow locally a nominal trajectory, keeping track of 
how the state (finite approximation to an infinite condi- 
tional probability density) changes. This local character 
imposes serious limits to the applicability and effective- 
ness of these algorithms. 

We notice that the traditional estimation algorithms 
are in effect attempts to extend inherently linear ideas 
to the nonlinear setting. One can completely understand 
linear systems by looking at isolated integral curves, but 
this is hopeless for the nonlinear case, because of the 
immense complexity of nonlinear evolution. 

Henri Poincare was the first to realize that the at- 
tention should be shifted from an analysis of isolated 
trajectories and local characterizations, to a more global 
topological and geometric understanding of the phase 
space of nonlinear dynamical systems 2 . At the present 
time we are witnessing a spectacular blossoming of non- 
linear dynamics, made possible on the one hand by great 
theoretical developments on global topological and geo- 
metrical analysis, initiated by Poincare's revolutionary 
work in the 19 t/l century, and on the other hand by the 
wide availability of increasingly powerful computers. 

We believe that very interesting new insights and ex- 
tremely accurate novel algorithms can be obtained by 
attacking parameter estimation problems using a global 
geometrical point of view. Moving up one level of ab- 
straction we wish to consider parameter estimation algo- 
rithms whose primitive objects are geometric structures 
in phase space (represented as points in an appropriate 
function space), rather than points on isolated trajecto- 
ries. 

We demonstrate how to exploit the complexity of 
global geometrical phase space structures of nonlinear 
dynamical systems and their dependence on parameter 
variations in order to obtain extremely accurate param- 
eter estimation algorithms that do not depend on local 
approximations, in the context of complex analytic dy- 
namics. 

In particular, the global algorithm was tested on the 
family of quadratic maps of the Riemann sphere and ra- 
tional maps obtained from Newton's method on complex 
cubic polynomials. We show how to transform the esti- 
mation problem into a problem of minimizing a dissimi- 
larity measure between images containing global dynam- 
ical information. Our experiments indicate that the dis- 
similarity function to be minimized is locally unimodal. 
In fact it seems to obey an exact power law locally. The 
exponent appears to be a new invariant of these dynami- 
cal systems, which we call the parameter sensitivity expo- 
nent, and which characterizes the dependence of global 



2 It is interesting to note that, just like Gauss' idea in 
the case of estimation theory, the motivation for Poincare's 
development of global geometrical dynamics comes from the 
study of the motion of celestial bodies. 



geometric structures on parameter variations. 

The global algorithm gives extremely accurate esti- 
mates for the parameters of these systems systems (im- 
proving the initial estimate by more than 13 orders of 
magnitude in a certain case). Moreover, appears to be 
very robust with respect both to observation noise and 
dynamical noise. 

2 Parameter Estimation and Global 
Geometry 

The study of how geometric structures in phase space 
change as system parameters vary is of great interest 
and has received much attention. The main focus so far 
has been the study of changes in the topology of phase 
space structures (bifurcation theory). 

Our objective is to exploit the way exact geometrical 
rather than topological features of phase space structures 
change as system parameters vary slightly, in order to 
obtain novel extremely accurate parameter estimation 
algorithms that do not depend on local approximations. 

This approach led us to the discovery of what seems to 
be a very general power law that enables us to quantify 
the dependence of global geometry on small changes in 
the parameters of the system. 

In order to demonstrate our approach we will restrict 
our attention to how basins of attraction change as pa- 
rameters vary, and show how to transform the param- 
eter estimation problem into an optimization problem 
over an appropriate space of functions containing global 
dynamical information. 

Our discussion in this paper will be restricted to es- 
timating a single parameter of a system, but our tech- 
niques can be readily generalized to higher dimensional 
problems. 

We begin by demonstrating this approach for the fam- 
ily of complex quadratic polynomials. 

2.1 The Quadratic Family 

Given any complex quadratic, polynomial p(z) = az 2 + 
262: + d, let M(z) - az + 6 and c = ad + b - b 2 . If 
f c : C -» C denotes the map f c (z) = z 2 + c, where C is 
the Riemann sphere, then: 

(4) M- 1 of c oM(z) = M-\(az + b) 2 + c) 

= M~ x {a 2 z 2 + 2abz + b 2 + c) 
_ (a 2 z 2 + 2abz + b 2 + c) - b 

a 
= p(z) 

i.e. p and f c are (analytically) conjugate. It follows 
that in order to understand the dynamics of all complex 
quadratic polynomials, it is enough to understand the 
dynamics of the complex one-parameter family 

<2 = {/e : C - C, c G C, f e (z) = z 2 + c) 

Both the variable z and the parameter c fill out a com- 
plex plane. We will refer to the 2-plane as the dynamical 
plane and to the c-plane as the parameter plane. 



2.1.1 Julia Sets 

Given_ a rational map / : C — + C of the Riemann 
sphere C = C U {oo}, we can get a dynamical system by 
repeated application of /. In the begining of the twen- 
tieth century, the French mathematicians P. Fatou and 
G. Julia studied the iteration of complex polynomials of 
degree d > 2. Having at their disposal a powerful the- 
orem of Montel that gave a sufficient condition for the 
normality of a family of meromorphic functions, they re- 
alized that it is very interesting to consider the following 
decomposition of the dynamical plane: 

Definition 2.1 A point z £ C is an element of the Fa- 
tou set, Fj of a rational mapping f , if there exists a 
neighborhood U of z, such that the family of iterates {f n } 
is a normal family on U. The Julia set Jf of f is the 
complement of the Fatou set. 

where, f n denotes n-fold functional composition of / by 
itself. 

Without recalling the exact definitions, let us only re- 
mark that normal families have values that do not di- 
verge under iteration. So, in some sense the Fatou and 
Julia sets of / are the sets of stable, unstable points of C 
with respect to /, respectively. 

We notice that the point at infinity oo is always an 
attracting fixed point for quadratic maps 3 . Let 

j4 c (oo) = {z e C : /" — > oo as n —> oo} 

be the basin of attraction of infinity. We have the fol- 
lowing result: 

Proposition 2.1 The Julia set Jj c of f c : C — ► C, 
fc(z) = z 2 -\- c, is equal to dA c (oo). 

The proof of this statement is a direct consequence of 
the fact that the boundary of any completely invariant 
component (here A c (oo)) of the complement of the Julia 
set has to equal the Julia set. 

The set Kf c = C — A c (oo) is called the filled in Julia 
set. We clearly have dK c = Jj c = dA c (oo), i.e. Jf c 
separates competition between orbits being attracted to 
oo and orbits remaining bounded for all time. 

Example 1: Consider the map / : C — > C, fo(z) — 
z 2 . Clearly any point outside the unit circle .S' 1 has an 
orbit that escapes to infinity. Moreover, any point in- 
side the unit circle has an orbit converging to 0. Conse- 
quently, the Julia set Jf of / is the unit circle S 1 . 

Example 2: The Julia set of the map /_ 2 : C -»• C, 
f-2{z) = z 2 — 2 is the interval [—2, 2] on the real line 4 . 

In general, Julia sets are not smooth, but very com- 
plicated fractal objects, exhibiting an amazing variety of 
geometric forms as the parameter c changes. Figure 1 
(taken from [26]) shows examples of Julia sets of complex 
quadratic polynomials corresponding to various points c 
on the complex plane, along with the Mandelbrot set 
which controls their topological structure. 



2.1.2 The Mandelbrot Set 

In 1905, P. Fatou proved the following very surprising 
result 5 : 

Theorem 2.1 Every attracting cycle for a polynomial 
or rational function attracts at least one critical point. 

Each quadratic polynomial f c in Q has a unique crit- 
ical point, namely z = 0. The corresponding critical 
value is f e (z ) = / c (0) = c. It follows that f c can have 
at most one attracting cycle in the complex plane. More 
generally, a polynomial of degree d > 2 can have at most 
d — 1 attracting cycles. 

In 1918-1919 P. Fatou and G. Julia proved another 
result which further supported their conjecture that the 
dynamical behavior is dominated by the behavior of crit- 
ical points: 

Theorem 2.2 Let Qj denote the set of critical points 
for a polynomial / : C — ► C, and let Kf be the set of all 
points jn C which do not escape to infinity under f, i.e. 
K f = C-A(oo). Then: 

1- Qj C Kf ■£> Jf is connected. 

2. Q f n Kf = => J f is a Cantor Set. 

Since for a quadratic map f c , there exists only one 
critical point namely zq = 0, an immediate corollary of 
theorem 2.2 is the following: 

Corollary 2.1 The Julia set Jf c of the quadratic map 
f c is either connected or a Cantor set. Moreover, Jf c 
is connected if and only z//"(0) does not tend to oo as 

71 — ► OO. 

The above corollary suggests a natural decomposition 
of the parameter plane into the Mandelbrot set 



(5) 



M = {ce€: J fc is 



lected] 



and its complement C—M. Moreover, corollary 2.1 gives 
us a way to compute the Mandelbrot set: in order to 
check whether a point c of the parameter plane is in M, 
it is enough to check whether the orbit of under f c does 
not tend to infinity. 

We remark that sets similar to the Mandelbrot set oc- 
cur in many other families of complex analytic maps. 
This happens since many mappings or their iterates lo- 
cally behave like a quadratic polynomial. Hence the 
Mandelbrot set is in some sense a universal object. 

The boundary dM of the Mandelbrot set is a bifur- 
cation set, i.e. the topological nature of the Julia set 
changes as we cross this set in the parameter plane. In 
the next sections we will investigate how the geometry 
rather that the topology of Julia sets depends on param- 
eter variations. 



' In fact oo is an _a.ttracting fixed point for all comple 
polynomial maps on C 

The proof is not obvious. See [6]. 



The proof can be found in [9]. 






Figure 1: Julia sets of complex quadratic polynomials surrounding the Mandelbrot Set which controls their structure. 
The picture is taken from [26]. 



2.2 Global Parameter Estimation for the 
Quadratic Family 

The complex one-parameter family of dynamical systems 



(6) 



Zn + l = Z n +C 



can be considered as the following real two-parameter 
family 



(7) 
(8) 



x n+1 = xl-yl + X 
y n+ i = 2x n y n -K 



under the usual identification of E 2 with C sending z n to 
Xn + Vni and c to A+^'. We want to consider the problem 
of estimating A, when £ is held constant, in the presence 
of observation and dynamic noise. In particular, suppose 
the real system has noisy dynamical evolution: 

(9) x n+l = x 2 n -y 2 n + \ + v dtX {n) 

(10) y n+ i = 2x n y n +Z + v d , y (n) 

and we actually observe: 



X-n — x n t "o^^Wj 
Vn = Vn +V 0i y(n) 



(11) 

(12) 

where 

both that the dynamical {vd, x (n)} n£Z+ , {v d ,y(n)} ne z + 
as well as the observation {v( ?l )}"€Z + , {\y( n )}n6Z+ 
noise sequences are white Gaussian random sequences. 

2.2.1 Setting up global functions for quadratic 
maps 

The primitive objects used in local methods are points 
on isolated trajectories. In the next section, we will show 
how to obtain extremely accurate estimates of the pa- 
rameter A by moving up one level of abstraction and 
consider an algorithm that uses representations of the 
Julia sets of the maps as primitive objects. In this sec- 
tion, we describe how to use proposition 2.1 in order to 
obtain a discrete^ representation of Julia sets of quadratic 
maps f c : C — ► C, f c .(z) = z 2 -\-c. The method we will de- 
scribe is often refered to as the Level Set Method (LSM) 
([26], [27]). 

We restrict our attention to a domain DcC^lxl, 
on which we impose a grid of n x m cells. Choose a large 
integer N max (iteration resolution) and an arbitrary set 
T (target set) containing oo. We will take T = {z : 
\\z\\ > R}, where R is a large number 6 . For each cell 
in the decomposition of the domain D assign an integer 
label l c (p) — l c (p,T), where p is the centerpoint of the 
cell, in the following way: 

(13) 

(k provided f c (p) £ T and /*(p) £ T, 

'c(p) = { for <i < k and k < N max 

1.0 otherwise 

If l e (p) is nonzero then p escapes to infinity and l c (p) is 
the escape time (measured in the number of iterations) 
needed to hit the target set T around oo. The contours 
obtained by the above algorithm are approximations of 

In most of our experiments we take R = 100. 



the equipotential curves of the filled in Julia set K c when 
J c is connected (see for example [27]). 

As c moves on the parameter plane where the Mandel- 
brot set lives, the corresponding Julia set changes from 
shape to shape producing an immense variety of possible 
geometric forms (figure 1) 

Let us fix £ to the value £ = 0.3, and consider how 
the geometry of the Julia sets changes as A varies locally 
around the value A = — 1. Figures 2, 3, 4 show a window 
of the Julia set for A = -1.0, A = -1.00001, A = -1.0001 
respectively. We see that the human eye can clearly sense 
changes of the order 10 -4 or so and tell which phase 
image from 3 and 4 is closer to 2. We expect that by 
comparing these images we can sense very small changes 
in parameters and obtain extremely accurate estimation 
algorithms. In the next section we describe how to turn 
the above intuitive approach of comparing images to get 
an estimate of a parameter into a precise algorithm. 

2.2.2 The Global Approach 

In order to be able to store in the computer and ma- 
nipulate the images that we get by running LSM on a 
domain D C R x 1, we chose to represent them as two 
dimensional arrays A = A\(D) = ((*i t j)i<i<n,i<j<m- 

The array A = A\(D) is a lookup table (a discrete 
representation) of a function: 

F A : D -► M 

which we will refer to as the global function for f c on D. 
Before we proceed, let us recall that if (X, /.i) is a mea- 
sure space, for p > 1, D> = L?(X,fi) = {f : X -> R 
measurable such that f x ||/||rf// < oo}. It is a standard 
result of functional analysis that L p (X,/.i) is a vector 
space and ||.|| p is a norm on L v (X^l). For / £ U the 
value: 



(14) 



11/11, 



■a 



ll/ll p ^ 



i/p 



is called the L^-norm of/. Let us define the //-distance 
between two functions f,gEL p to be: 

(15) dpV,9) = \\f-9\\p 

Let us choose the domain D to be a compact rectangle 
in E 2 . Then the function F x E L p , where X = D and 
fi is the Lebesgue measure on D C M 2 . The distance d p 
gives us a measure of how different F\, F^ are for A,/< 
two different values of the parameter. If 

^A = { a i,j)l<i<n,l<j<m 
An = ( a i,j)l<i<n,l<j<m 

are the discrete representations of F\, F^ respectively, a 
natural measure of their difference, is the discrete L p - 
distance (p < oo): 

(16) 

d p (A x ,A fl ) = \\A X 



A n \\n — 



l H\\P 



r n m 

EE 

t=i j=i 



-|i/p 



*j 



^ up 



5 



From now on we will tend to use the same notation for 
both the continuous quantities and their discrete repre- 
sentations. 




Figure 2: Julia Set for A = -1.0, £ = 0.3, D = [-0.079555, -0.079525] x [0.265320,0.265:350]. 




Figure 3: Julia Set for A = -1.00001, £ = 0.3, D = [-0.079555, -0.079525] x [0.265320,0.205350]. 




Figure 4: Julia Set for A = -1.0001, £ = 0.3, D = [-0.079555, -0.079525] x [0.265320,0.265350]. 



It would seem that the L p distance of global functions 
F\, Fn is a rather coarse characterization of the differ- 
ences of the corresponding images. It turns out that it 
is ideally suited for doing parameter estimation in the 
presence of noise. The reason is that, intuitively, taking 
the L p distance of global functions has the effect of av- 
eraging away the effects of noise, returning a measure of 
the difference between the images which is rather noise 
insensitive. 

Given a parameter value A, let us define the following 
functional: 

(17) il> x :L p ^R 

(18) MG) = d p (F x ,G) 

Let us now define the dissimilarity function for parame- 
ter A to be the function: 

(19) <£> A :M-+IK 

(20) <Px(/i) = M^) = d p (F Xl F lt ) 

As fi — ► A we expect <p\(n) = d p (Ax,A fi ) — > 0. In- 
tuitively, we expect the dissimilarity function ip\ to be 
unimodal 7 . locally around A with a local minimum at 
/< = A. 

Our experiments not only confirm the above conjec- 
ture, but indicate that there is a lot of structure in the 
way the dissimilarity function decreases to the local min- 
imum: an exact power law is obeyed around A. More- 
over, local unimodality of f\ around A seems to be very 
robust with respect to dynamical and especially obser- 
vation noise. 

Once we know the dissimilarity function <p\ is locally 
unimodal, with the real value of the parameter A be- 
ing the local minimum, we can use one of the standard 
optimization algorithms to determine A. In our experi- 
ments we use the Golden Section Method, described in 
appendix A. This method uses the fact that we can 
bracket the location of the minimum of a unimodal func- 
tion by evaluating the function at two distinct points in 
the region L of unimodality. 

2.3 Performance of the Global Algorithm for 
the Quadratic Family 

In this section we give a list of sample runs of the global 
algorithm for quadratic maps f c : C — ► C, f c {z) = z 2 + c. 
If c = \+£i we want to consider the problem of estimat- 
ing A assuming f is fixed, in the presence of observation 
and/or dynamical noise (equations 11, 9). 

Let F\ be the (noisy) global function whose discrete 
representation A\ is obtained by performing measure- 
ments (according to equation 11) on the noisy real sys- 
tem. Local minimization of the dissimilarity function 



A function ip : K — ► R is said to be unimodal on a closed 
interval L C K if there is an x* £ L such that x* minimizes 
ip on L and, for any two points x u x 2 € L, such that xi < x 2 
we have: 

X 2 <X* =»/(*l)>/(* 2 ) 

** <x\ => f{x 2 ) >/(xi) 
Note that unimodal functions are not necessarily differen- 
tiable or even continuous. Strictly convex functions and most 
of their generalizations are unimodal. 



fx(v) = d 1 (F x ,F,) 

is obtained using the Golden Section Method. Phase 
windows are represented as 512 x 512 arrays, and the 
Boundary Scanning algorithm used, checks for escape 
or orbits out of a circle of radius 100 centered at the 
origin. The noisy phase portrait A\ corresponds to A = 
— 1.0 and is obtained with Gaussian observation noise 
with variance a = 10~ 3 and no dynamic noise. The 
accuracy achieved gives an upper bound on the distance 
of the global method estimate from the real value of the 
parameter. 

• A = 0.3 

1. Domain: [-1.5, -1.0] x [0.0, 0.5] 
Accuracy Achieved : 10 -10 

Number of Orbit Points in typical image : 

4120137 

Average Number of Orbit Points per cell : 15 

2. Domain: [-1.445, -1.37] x [0.05, 0.2] 
Accuracy Achieved : 10 -11 

Number of Orbit Points in typical image : 

7150541 

Average Number of Orbit Points per cell : 27 

3. Domain : 
[-1.40507,-1.40506] x [0.100155,0.100165] 
Accuracy Achieved : 10 -13 

Number of Orbit Points in typical image : 

17446074 

Average Number of Orbit Points per cell : 66 

4. Domain 

[-0.079555,-0.079525] x [0.265320,0.265350] 

Accuracy Achieved : 10" 15 

Number of Orbit Points in typical image : 

33692290 

Average Number of Orbit Points per cell : 128 

5. Domain : 
[-0.079545,-0.079535] x [0.265330,0.265340] 
Accuracy Achieved : 10~ 16 

Number of Orbit Points in typical image : 

36951018 

Average Number of Orbit Points per cell : 140 

• A=0.35 

1. Domain: [-1.445, -1.37] x [0.05, 0.2] 
Accuracy Achieved : 10~ 9 

Number of Orbit Points in typical image : 

3905591 

Average Number of Orbit Points per cell : 15 

2. Domain: [-1.4075, -1.4070] x [0.0992, 0.0997] 
Accuracy Achieved : 10 -10 

Number of Orbit Points in typical image : 

9058660 

Average Number of Orbit Points per cell : 35 

3. Domain : 
[-1.40715,-1.40710] x [0.09957,0.09962] 
Accuracy Achieved : 10 -11 

Number of Orbit Points in typical image • 

12225105 

Average Number of Orbit Points per cell : 47 

• A=0.40 



1. Domain: [-1.445, -1.37] x [0.05, 0.2] 
Accuracy Achieved : 10~ 8 
Number of Orbit Points in typical image : 
3100158 
Average Number of Orbit Points per cell : 12 

2.4 Cayley's Problem and Newton Basins 

In this section, we consider the problem of estimating 
a parameter in a dynamical system obtained from New- 
ton's method for cubic complex polynomials. 

Newton's method and its variants are among the most 
prominent numerical methods for finding solutions to 
nonlinear equations. From a numerical point of view 
Newton's method has always been considered a local 
method, i.e. one assumes that the initial guess is suffi- 
ciently close to a root, and then the orbit under Newton's 
iteration scheme tends to this root. 

In 1879 A. Cay ley, in a one page paper [10], sug- 
gested the extension of what he calls the Newton-Fourier 
method 



(21) 



x k +i = N(x k ) = x k - h 



/(**) 



applied to complex polynomials / : C — ► C, where h is 
a real number. It is interesting to note that 21 can be 
interpreted as the Euler method with stepsize h for the 
initial value problem: 



(22) 



x{t) 



/(*(<)) 
/'(*(*)) 



(23) *(0) = xo 

Each of the roots of / is an attracting fixed point of 
the Newton-Fourier iteration. Cayley suggested study- 
ing the method globally, i.e. understanding the geometry 
of the basins of attraction of the roots in phase space. 

The problem is easy in the case of quadratic polynomi- 
als: we have seen that any quadratic map is analytically 
conjugate to one of the form f c (z) — z 2 + c. Newton's 
method for f c is a rational map of degree 2: 

,2 



(24) 



N(x) = 



X* + c 
2x 



It can be shown that the Julia set J/v of N is the perpen- 
dicular bisector of the segment joining the roots ±y/c. 
Thus the basins of attraction of the two roots are the 
half planes denned by J^. 

The geometry of the problem for higher degree poly- 
nomials is extremely complicated. To get a feeling for 
why that should be so, it is enough to note that we have 
an poynomial / of degree n, then if A{ is the basin of 
attraction of the i th root pi of /, we must have: 

Jf = dAi , i = I, ... ,71 

i.e. each point in the Julia set Jf must touch simultane- 
ously all basins of attraction. Figure 5 shows the Newton 
basin portrait for the cubic 

q(z) = (z- 0.5i)(* + 0.5i)(* - 1) 

The values assigned to each cell in phase space corre- 
spond to convergence time to a root. If k is conver- 
gence time of the center point of a cell to pi where 



1 < i < 3, then the value assigned to the cell in question 
is 3Jfc + (i- 1). 

Suppose pi,p2 are known. We want to consider the 
problem of estimating p 3 in the presence of observation 
and dynamical noise. We let 

9px,P2,fiM = ( z ~ Pi)( z ~ P2)(z ~ Pz) 

and define F Pl>P2i \ to be the Newton basin global image 
corresponding the polynomial q Pl! p 2> \. Suppose the real 
value for A is A = 1.0 and let us restrict A to move on the 
real axis. This gives us a situation exactly analogous to 
the one for Julia sets of complex quadratic maps. Again 
we get an estimation algorithm by trying to minimize: 

<p PltP2 :M — ]R 

where F PuP2i \ is the noisy Newton basin global function 
for the third root equal to A. 

2.5 Performance of the Global Algorithm for 
Newton's Basins 

In this section we give a list of sample runs of the global 
algorithm for the case of dynamical systems obtained by 
Newton's method. Local minimization of the dissimilar- 
ity function <p PliPa where p x — 0.5i and p 2 = —0.52, is 
obtained using the Golden Section Method. Global func- 
tions are represented as 512 x 512 arrays. If the number 
of iterations it takes for the centerpoint of a given cell to 
enter a neighborhood of one of the roots, say pi (where 
P3 = p is the parameter) is k, then the value assigned to 
each the cell in question is 3fc+ (i— 1). The noisy phase 
portrait F Pl>P2) \ corresponds to A = 1.0 and is obtained 
with Gaussian observation noise with variance <t = 10~ 3 
and no dynamic noise. The accuracy achieved gives an 
upper bound on the distance of the global method esti- 
mate from the real value of the parameter. 

1. Domain: [-0.044, -0.024] x [-0.105, -0.085] 
Accuracy Achieved : 10 -9 

Number of Orbit Points in typical image : 10538413 
Average Number of Orbit Points per cell : 40 

2. Domain : 
[-0.033842, -0.033822] x [-0.093942, -0.093922] 
Accuracy Achieved : 10 -13 

Number of Orbit Points in typical image : 25907786 
Average Number of Orbit Points per cell : 99 

3. Domain 

[-0.033833, -0.033832] x [-0.0939325, -0.0939315] 
Accuracy Achieved : 10~ 14 

Number of Orbit Points in typical image : 34889936 
Average Number of Orbit Points per cell : 133 

4. Domain : [0.075,0.080] x [0.077,0.080] 
Accuracy Achieved : 10 -8 

Number of Orbit Points in typical image : 12464288 
Average Number of Orbit Points per cell : 48 
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Figure 5: Newton Basins for f(z) = (z - 0.5i)(z + 0.bi)(z - 1), and domain D = [-0.5, 1.5] x [-1.0, 1.0]. 
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Figure 6: Quadratic Family: Plot of 6F\ s\ as a func- 
tion of ||<$A||, 6\ > 0, for A = -1, £ '= 0.3. The 
maximum number of iterations is 100, and the do- 
main is D = {(x,y) : x e [-0.079555, -0.079525], y e 
[0.265320, 0.265350]}. A 512 x 512 cell resolution is used. 
The real image is measured with Gaussian observation 
noise with variance a — 10~ 3 and no dynamical noise 
(cr<i = 0). The plot on the top shows some sample points, 
and the one on the bottom is the same plot with straight 
lines connecting the sample points. 



Figure 7: Quadratic Family: Plot of \og(6F\ s\) as a 
function of log||<5A||, 6X > 0, for A = -1, £ = 0.3. 
The maximum number of iterations is 100, and the do- 
main is D = {(x,y) : x £ [-0.079555, -0.079525], y G 
[0.265320,0.265350]}. A 512x512 cell resolution is used. 
The real image is measured with Gaussian observation 
noise with variance a = 10~ 3 and no dynamical noise 
(a d = 0). The plot on the top shows some sample points, 
and the one on the bottom is the same plot with straight 
lines connecting the sample points. 
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Figure 8: Quadratic Family: Plot of \og(8Fx ! sx) as a 
function of log||<SA||, 8\ > 0, for A = -1, £ = 0.3. Each 
line corresponds to a different domain. The resolution 
increases from right to left. The real image is measured 
with Gaussian observation noise with variance a = 10 -3 
and no dynamical noise. 



Figure 9: Quadratic. Family: Plot of \og(8F x> 6x) as a 
function of log||£A||, 8\ > 0, for A = -1, £ = 0.3. Each 
line corresponds to a different domain. The resolution 
increases from right to left. The real image is measured 
with Gaussian observation noise with variance <j — 10 -3 
and no dynamical noise. 



13 



5. 10 



1. 10 



10000. • 



..'v 



,m " 



-13 -11 -9 -7 
1. 10 1. 10 1. 10 1. 10 



1. 10 







-12 -10 -8 
1. 10 1. 10 1. 10 1. 10 



.01 1 




-13 -11 
1. 10 1. 10 1. 10 



1. 10 



1. 10 



-12 -10 
1. 10 1. 10 1. 10 



).01 



1. 10 



Figure 10: Newton Basins: Plot of \og<p Pl>P2 (\ + 6X) as 
a function of ||<5A||, <5A > 0, for pi = 0.5z,/9 2 = -0.5i, 
where A = 1. The real image is measured with Gaus- 
sian observation noise with variance a = 10~ 3 and 
no dynamical noise (cr d = 0). The plot on the top 
shows some sample points, and the one on the bottom 
is the same plot with straight lines connecting the sam- 
ple points. A 512 x 512 cell resolution is used over the 
domain D = {(x,y) : x e [-0.033842, -0.033822], y e 
[-0.093942,-0.093922]. 



Figure 11: Newton Basins: Plot of \og<p PliP2 (\ + 6\) as 
a function of ||<5A||, 6\ > 0, for p x — 0.5i,p 2 = -0.5i, 
where A = 1. The real image is measured with Gaussian 
observation noise with variance a — 10 -3 and no dy- 
namical noise (<r d = 0). The plot on the top shows some 
sample points, and the one on the bottom is the same 
plot with straight lines connecting the sample points. A 
512 x 512 cell resolution is used over the domain D = 
{(x,y) : x £ [-0.044, -0.024], ye [-0.105,-0.085]}. 
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Figure 12: Newton Basins: Plot of \ogip PltP2 (X + <5A) 
as a function as a function of log||<5A||, 6X > 0, for 
p\ = 0.5i,/)2 = — 0.5z', A = 1. Each line corresponds to a 
different domain. The resolution increases from right to 
left. The real image is measured with Gaussian obser- 
vation noise with variance cr = 10~ 3 and no dynamical 
noise. 



3 The Parameter Sensitivity Power Law 

"Let chaos storm! 

Let cloud shapes swarm! 

I wait for form. " 

R. Frost, "A Further Range: Ten Mills, (V) 

Pertinax" (1936). 

The numerical experiments not only confirm that the 
dissimilarity function <p\ is locally undimodal around A, 
but indicate that there is a lot of structure in the way it 
decreases to the local minimum. In particular, if F\ is 
the global function corresponding to A, and /< = A + 8X, 
then for ||<5A|| small enough, a power law of the form: 



(25) 



SF Xi5x = d p (F x+ sx, F x ) = M\\6X\\ d 



is obeyed. We will refer to equation 25 as the parame- 
ter sensitivity power law. 

Let us consider, for example, the quadratic map 
f c (z) = z 2 + c, c = X + £i. Figure 6 is a plot of 
di(F\+s\, F\) as a function of 6X, for A = — 1, £ = 0.3. 
They shape of the curve that we get immediately sug- 
gests a power law around A = — 1 with exponent less 
than 1. If in fact a power law of the form of equation 25 
holds, then taking the logarithms of both sides gives a 
straight line of slope d: 



(26) 



\og8F x ,6x = rflog||6A|| + m 
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where m = logM. Figure 7 shows a log-log version of 
figure 6, which is in fact a straight line. We give the 
following definitions: 

Definition 3.1 Suppose that the local power law holds 
for some global function Fx = F x ,d '■ D — ► R. We define 
the parameter sensitivity exponent (p.s.e.) 7 of the global 
function Fx to be 7 = 1 — d. Moreover, we define m = 
logM to be the resolution factor of Fx- 
The parameter sensitivity exponent 7 is a measure of 
the performance of the global algorithm, since it quan- 
tifies our ability to distinguish nearby global functions 
(images). If 7 = 0, i.e. a linear power law is obeyed, the 
global function is rather insensitive to parameter varia- 
tions. The performance of the global algorithm improves 
as 7 gets closer to 1, i.e. as the slope d in the log -log 
plot decreases towards 0. 

Looking at the plot of \ogdi(Fx+s\, Fx) as a function 
of log||5A|| for different domains of a system and putting 
the resulting plots together (for example figures 8, 12), 
leads to the conclusion that the slope d changes very lit- 
tle as we change our focusing window, i.e. the domain D, 
in phase space! Hence it makes sense to talk about the 
parameter sensitivity exponent of the system. All experi- 
ments performed indicate that the following conjectures 
are true: 

Conjecture 3.1 The parameter sensitivity exponent of 
a system is the sa7ne for all typical domains 8 . 

8 By typical we mean that the global function F\ : D — ► R 
is representative of the complexity of the system. For exam- 
ple, in the case of connected Julia sets a typical domain is any 
domain near the boundary dA c (oo). A domain lying wholy 
in the interior of the Julia set, in which the global function 
is identically zero, is not a typical domain. 
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Figure 13: Quadratic Family: Plot of \og(8F x sx ) as a 
function of log||«5A|| for A = -0.12 and £ = 0.74. The 
maximum number of iterations is 100, and the domain is 
D = {(x,y) :xe [0.3, 0.4], ye [0.3,0.4]}. The lines from 
left to right are for resolutions of 512 x 512, 256 x 256, 
128 x 128, 64 x 64 cells. 



Consequently, the parameter sensitivity exponent 
seems to be a new dynamical system invariant 

that quantifies the dependence of global geometry 9 on 
variations of the parameter. 

Conjecture 3.2 The parameter sensitivity exponent of 
a system is independent of the cell resolution 10 used 
in approximating the global function. Higher resolution 
only increases the resolution factor m. 

Of course the parameter sensitivity exponent (p.s.e.) is 
not exactly the same for all resolutions, because we get 
discretization errors in low resolutions. The p.s.e stays 
close to a dynamical system invariant and converges to 
it as the number of cells goes to infinity. An example on 
which conjecture 3.2 is tested is show in figure 13. 

To gain some more insight into the power law let us 
rewrite equation 25 as: 



(27) 



8F. 



x,sx 



M 



M 



8X \\6\\\*-* ||*A||- 



Consider the limit 



(28) 



(dF\ 
V dX 



Lp 



lim 

<5A-fO 



8F 



A,<5A 



11**11 



or more generally, of global information as represented 
by the global function F\. 

10 The cell resolution is the number of cells used in the finite 
representation of the global function. 



The above limit is a mean derivative in ZAnorm of the 
global function F x with respect to the parameter A. We 
see that when d = 1, then the limit exists. When d < 1 
the limit 28 does not converge, and the parameter sensi- 
tivity exponent j = 1 — d measures its rate of divergence. 
Remark: The knowledge that a power law is obeyed 
locally around the true value A of the parameter can be 
used to improve the performance (number of function 
evaluations) of the global approach enormously! More- 
over, it can be used to check for errors and place a safe 
bound on the distance from the real value of the param- 
eter. 

3.1 The power law for smoothly changing 
global functions 

In this section we demonstrate that if a global function 
is sufficiently smooth with respect to changes in the pa- 
rameter, then a linear power law must be obeyed locally. 
In particular, we prove the following proposition: 

Proposition 3.1 Given a global function F\ : D —> 
R, if for every point x G D, F x (x) is twice, differen- 
tiable with respect to the parameter A and the derivative 
d?F x (x)/dX 2 is continuous and bounded in a neighbor- 
hood U of X, then a linear power law is obeyed locally 
for any V -norm. Moreover, the L p -derivative of F\ 
with respect to A exists and is equal to the space aver- 
age \\dF x /dX\\ LP . 

Proof: 

The proof is a straightforward application of Taylor's 
formula. Since F x (x) has a continuous second derivative 
with respect to A, then for any // = A + 8X £ U we have: 



(29) 
(30) 



F »(*) = it^r(t t - x ) k + E n.M 



k=Q 



k\ 



En,M = ^i f\»-t) n F?+ l (z)dt 

n - JX 



where F£(x) = d k F x (x)/dX k and n = 1. If A is an upper 
bound for F?{x) over U , then 

(3i) PM/0<!||/*- 

Consequently, for 6A small enough we have 

dF x (x) 



| 2 = !ll*A|| 2 



(32) 



\\F x+6X (x)-F x (x)\\*\\. 



dX 



8X\\ 



The ZAdistance between F x and F M is: 
(33) SF(X 1 6X) = d p (F x+6Xi F x ) 



-tt 



\\F x+sx (x)-F x (xWdx 



i/p 



Hence, for sufficiently small ||£A||: 

t dF x 



(34) 8F(X,8X) 



(I 



\\-^\\ p dx \\8X 



dX 



i/p 
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The above linear power law implies that: 
(35) 

(**±) = lim 6F *M 
V dX ) LP «5™o ||fiA|| 



-(/.■$ 



\\ p dx 



i/p 



□ 

The above proof not only shows that a power law holds 
locally, but actually it holds pointwise. The following 
lemma gives us a simple test for pointwise validity of the 
power law. 

Lemma 3.1 If a power law holds pointwise in norm 
L Po , then the power law with the same exponent holds 
for any norm L p . 

The proof is trivial. It turns out that when the expo- 
nent of the parameter sensitivity power law is not 1, the 
law does not hold pointwise, but is the effect of spatial 
averaging in L p -norm. Figure 14 gives an example of a 
system where this is tested using lemma 3.1. 
Consider now the linear differential equation 



(36) 



dx 
~dt 



= h(X)x 



on the real line IK. The solution of the above equation 
(equation 36) satisfying the initial condition x(0) = xq 
is given by 



(37) 



x(t) = a; e*< A >* 



Suppose h(X) > 0. In this case, we have a repelling 
fixed point at x = 0. Given a domain D = [a,b] C 15k, 
and we define a function F\ = F\ t M '• D —+ 15k assigning 
to each point in the domain D its escape time from a 
large interval [—M,M]. Let us ignore the point x = 0, 
where F\(x) = 0, since it does not change anything when 
we integrate over it. For x different from zero we have: 



F x (x) = 



InM 



tf(A) 



For most smooth functions g the above global function 
satisfies the requirements of proposition 3.1 hence a lin- 
ear power law is valid. 

For example, if g(X) = e A , we get 



d 2 F x (x) d 2 
Y^- = (InM - ln||z lh — 



1 



dX' v " "'d\ 2 \g{\) 

= (lnM-ln||;r||)e- A 

which is continuous and bounded for any A. Similarly, 
for g{\) = A a linear law holds around any A different 
from 0. 

The same reasoning applies to the case when g(X) < 
0. The only difference is that F\(x) measures the time 
needed to enter some predetermined neighborhood of the 
attracting fixed point 0, instead of a neighborhood of 
infinity. 

Let us now consider a linear system 



(38) 



dX Af\\ 

— = A(X)x 
dt v y 



where x E IK n . A general solution to (38) can be obtained 
by a linear superposition of n linearly independent solu- 
tions {x 1 ^),... ,x n (t)}: 



(39) 



K0 = !>*>■(<) 



where the n unknown constants c.j are to be determined 
by initial conditions. If .4(A) has n linearly independent 
eigenvectors v J ' , 1 < j < n, then we may take as a basis 
for the space of solutions the vector valued functions 



(40) 



x j (t) = e A 'V 



where Xj is the eigenvalue associated with u> . Let us 
assume that al least one of the eigenvalues is real and 
positive. Let Ai be the largest of the positives eigenval- 
ues. Then for t very large we have 



:(t) 



cie Al V 



Hence escape time from a very large circle around the 
origin is approximately 



Fx(x) 



InM - lnHc^ 1 !! 
MA) 



Essentially the problem is reduced to the one- 
dimensional problem considered above. If A is a suf- 
ficiently smooth function of A, the maximum eigenvalue 
Ai is a sufficiently smooth function of A and the require- 
ments of proposition 3.1 are satisfied. All this can be 
made more rigorous. 

It seems that almost all linear systems of differential 
equations would give rise to dynamical systems exhibit- 
ing a linear parameter sensitivity law with respect to 
global functions measuring convergence time to the fixed 
point and infinity. A more precise and rigorous theory 
of global estimation on linear systems can easily be de- 
veloped. 

In the previous section we have discussed discrete 
maps rather than continuous ones. In those cases time 
is discrete and hence F\ might not be differentiate with 
respect to A. For example, if we take the discrete dy- 
namical system: 

x n+ i = Xx n , X > 1 

then escape time out of a circle of radius M is given by 



Fx(x) = 



log A 



Proposition 3.1 can still be applied in the sense that 
the curves of equal discrete escape time are just ap- 
proximations to curves of the continuous escape time 
G\(x) = log 1 1 -^ 1 1 /log A which satisfies the requirements 
of the proposition. 

3.2 Additional Examples 
3.2.1 The Forced Pendulum 

So far we have only seen discrete dynamical systems 
from complex analytic dynamics exhibiting a positive 
parameter sensitivity exponent (p.s.e.). In this section, 
we provide an example of a continuous time dynamical 
system with high p.s.e. which has a completely differ- 
ent dynamical structure, and enforces the belief for the 
universality of the parameter sensitivity power law. 

In particular, we consider the forced pendulum de- 
scribed by the equation (see [3, 23]) 



i=i 
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(41) 



d 2 e 



de 



-r-z- + A— + (3 sin 9 = y cost 
dt z dt 





Figure 14: Plot of log(<$F Aj5A ) vs. log||<5A|| for the 
quadratic map with c = A + 0.74i, A = —0.12, where 
F\ is escape time out of a circle of radius Ft = 100. A 
resolution of 512 x 512 cells is used, and the domain cho- 
sen is D = [0.3, 0.4] x [0.3, 0.4]. The plot on the top is a 
display of the data in Z^-norm. The slope is d « 0.54. 
The plot on the bottom shows the data in L 2 -, L 3 -, L 4 - 
norm (from top to bottom) with slopes « 0.23, 0.13, 0.08 
respectively. 



(42) 


Pi 


(43) 


P2 


(44) 


P3 


(45) 


Pa 



For parameter values A = 0.1,/? = 1.0,7 = 1 .75 the 
system has (at least) four attracting periodic orbits (each 
having period 2ir). For the time 2-w Poincare return map, 
these orbits are attracting fixed points located at: 

(3.287,0.262) 
(4.301,0.397) 
(0.053,-1.070) 
(0.084,1.608) 

Let us assume the unknown parameter is A, for a given 
a domain D define the global function F\ : D — ► R to 
have discrete representation A\ assigning to each cell 
the label of the fixed point its centerpoint is attracted 
to. This means that if the centerpoint p is attracted 
to pi we assign to the cell the value i. This is a very 
coarse representation of a global function: we do not 
record convergence time or any such information; just 
the label of the attractor. Figure 15 shows a picture of 
the resulting image for a 128 x 128 cell decomposition of 
the domain D = [0, 27r] x [-2,4]. All computations have 
been made with a Bulirsch-Stoer integrator. 

Figure 16 demonstrates that the local parameter sen- 
sitivity power law is obeyed in a very impressive way 
around A = 0.1. The parameter sensitivity exponent 
seems to be 7 = 1 - d « 0.907 This is an enormous 
p.s.e., giving a global parameter estimation algorithm 
with very impressive performance. 

3.2.2 The tent map 

The next step is to consider the simplest possible 
systems for which the parameter sensitivity power law 
holds, and for which an analytic proof is possible. 

To that end, consider the tent map: 

fx = f : / - / 



(46) 



hi*) = 
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\x if x < 0.5, 

A(l-x) if x > 0.5. 

where I = [0, 1] is the unit interval. The global function 
F\ : D — * M,D C / that we will consider, just like 
in previous cases, measures escape time from an interval 
[— M, M]. In particular, we will take M = 1, i.e. measure 
escape times from the unit interval. 

If the slope A is less than 1, then no points ever escape 
from the unit interval. Consequently, F\(x) = 0, for all 
x £ I. 

Let us restrict to the case A > 1. If we define a to be 
1/A, then we notice that the points in the interval 

7i(A) = [a, I -a] 

escape after one iteration of the tent map / A . Moreover, 
points in the intervals 

/1(A) - [cv 2 , cv - a 2 }, 7 2 (A) = [1 - (a - cv 2 ), 1 - cv 2 ] 

escape after 2 iterations, and points in 

/j(A) = [« 3 ,« 2 -« 3 ] 

7 3 2 (A) = [a - (cv 2 -« 3 ),«- cv 3 ] 

/ 3 3 (A) = [l-(cv 2 -« 3 ),l-c* 3 ] 

/ 3 4 (A) = [!-(«- a 3 ), 1 - (a - (a 2 - cv 3 ))] 




Figure 15: The basins of attraction of the four fixed points of the 2n Poincare return map for the forced pendulum 
with A = 0.1,/?= 1.0,7= 175, over the domain D = [0,2tt] x [-2,4]. 
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Figure 16: Plot of \og(8Fx tS x) vs. log||<5A|| for the forced 
pendulum for A = 0.1,/? = 1.0, 7 = 1-75. The are four 
attracting fixed points {pi , P2, P3, P4} for the 2n Poincare 
return map. F\ (x) is just the label i of the fixed point 
to which x is attracted (1 < i < 4). A resolution of 
128 x 128 cells is used, and the domain chosen is D = 
[0,27r] x [—2,4]. Z^-norm is used and the slope of the 
line is approximately 0.093!!! 



escape after 3 iterations. In general, there are 2* -1 in- 
tervals each of which has length /^ = ev fc_1 (l — 2cv) con- 
taining points that escape after k iterations. 

Figure 17 clearly shows that the parameter sensitivity 
power law holds, and that the p.s.e. (the slope of the 
lines in the log-log plots) is an invariant of the system 
independent of the resolution. 

3.2.3 A fractal curve boundary of attraction 

Given 1 < A < 2, consider the map ([14], [18]) M x = 
M:Mx,SUlRx S 1 , defined by 



(47) 

where 

(48) 
(49) 



M(x k ,0 k ) = (x k+u e k+1 ) 

x k+i = \x k + cos6 k 
k +\ = 20 k (mod 27r) 



This map has two attractors +00 and —00 for the 
first component, meaning, if f n (x , 6 ) = (x n ,d n ) then 
lim n _ 00 x n — ±00. M has no finite attractors since the 
eigenvalues of the Jacobian matrix are 2 and A > 1. 

To calculate what the boundary between the basins 
of attraction Ax(±oo), we proceed as in [18]. We first 
notice that given any initial point (xo,0 o ), we have 
9 k = 2 k d . The map M is two to one (and hence nonin- 
vertible), but given any point x N and N = 2 N 6 we can 
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Figure 17: Plots of \og(6F Xt s\) vs. log||<5A|| for the tent 
map, with A = 3, where F\ is escape time out of the 
unit interval, with a maximum number of iterations tol- 
erated equal to 1000. The domain chosen is the unit 
interval. The plots are, from top to bottom, the data 
with a resolution of 200000, 100000, 50000,30000, 10000 
cells. The slope of the linear segments is about d « 0.35, 
i.e. 7 « 0.65. 
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Figure 18: Plot of \og{8F XM ) vs. \og\\8\\\ for the 
map 47, where F x is escape time out of a circle of ra- 
dius R = 150 (maximum number of iterations tolerated 
is 200). The parameter value is A = 1.5, a resolution 
of 256 x 256 cells is used, and the domain chosen is 
D = [-1.3, -1.2] x [0.1,0.2]. 



Figure 19: Plot of \og(8F Xi6X ) vs. log||<$A|| for the 
map 47, where F\ is escape time out of a circle of ra- 
dius R = 150 (maximum number of iterations tolerated 
is 200). The parameter value is A = 1.5, a resolution 
of 256 x 256 cells is used, and the domain chosen is 
D = [-1.3,-1.2] x [0.1,0.2]. The plots are, from top 
to bottom, the data in L 1 , L 2 , L 3 , L 4 norm. 



always select an orbit that ends at (xn,$n), by taking 
Xk-i = X~ x Xk — A -1 cos(2 fc-1 0o)- This orbit starts at 



N-l 



(50) 



xo = \~ N x N - J2^ i+1 cos (2^o) 



The boundary between the two basins A\(±oo) are those 
(x , o ) for which xjy remains finite as N — >• oo. So: 



where 
(51) 



dA x (±oo) = {(x,e):x = f x (6)} 



oo . 

h(0) = -^(j)^ 1 ^(2*0) 



Since A > 1 the above sum converges absolutely and 
uniformly. In addition, the following sum 



dfx(0) 

de 



1 °° 9 



2^^ V A 

i=0 



diverges, because A < 2. Hence the curve f\(6) is nondif- 
ferentiable. Moreover it has been proved (in [21]) f\{9) 
has fractal dimension d c = 2 — In A/ In 2. 

Figure 18 is evidence that the parameter sensitivity 
power law holds locally, and figure 19 proves that the 
power law does not hold pointwise. 
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4 The Effect of Noise on Convergence 

We notice that the information obtained from each orbit 
(number of iterations it takes to enter a neighborhood 
of an attractor) is very insensitive to observation noise, 
i.e. noise that enters in the measurement equation (for 
example equation 11 for the quadratic family) 11 . More- 
over, observation noise is averaged away by taking the 
L p norm of the corresponding images. Consequently, ob- 
servation noise has no important effect on convergence 
properties of the global method. 

It is much more interesting to see how the dynamic 
noise, i.e. noise that enters in the dynamic, evolution 
equation, affects the convergence and accuracy of the 
global approach. It is very important to check that rea- 
sonably small dynamic noise does not destroy conver- 
gence properties of the global algorithms because in all 
systems some sort of small dynamical noise always ex- 
ists. In particular, all numerical simulations introduce 
dynamic noise by rounding off. 

Let us call the the log-log plot of the L p distance of the 
image F x +6\ generated for parameter value A + 6X from 
the noisy image F x corresponding to the true value A as 

We believe this is a remarkable property: using very 
coarse information from each orbit (coarse building blocks), 
we build a structure containing global information which is 
very delicate with respect to changes in parameters (fine over- 
all structure). 



a function of ||<5A|| the performance curve of the global 
algorithm. 

Our experiments have shown that the presence of dy- 
namic noise has the effect of leveling off the performance 
curve of the global algorithm to around a value of the 
order of the difference d p (F\, F\). 

It is very important to note that if the dynamical noise 
is below a certain value, then the local unimodality of the 
dissimilarity function is not destroyed. This is demon- 
strated by the experiments depicted in figures 20 and 
21. 

Intuitively, we would expect the presence of dynamic 
noise not to be disastrous, since the results of sec- 
tions 2.3,2.5 show that on the average there are from 10 
to 150 iterations per orbit, and the dynamic noise does 
not have the time to manifest itself if it is small enough. 
On the contrary, local algorithms attempt to follow a 
nominal trajectory for significantly longer. Hence, the 
presence of dynamic noise can have disastrouss effects 
on the convergence properties of local algorithms, espe- 
cially in the case of chaotic dynamical systems. 

5 Combining Estimation and Control 

Let us onsider again the problem of estimating the pa- 
rameter A in the case of the quadratic system 7, but now 
instead of assuming that £ is fixed, suppose £ is a con- 
trol parameter that we can tune. We want to address the 
question of what value of £ gives optimal performance for 
the global method for estimating A. 

Consider the domain D = [-1.445,-1.37] x [0.05,0.2], 
and observe how the performance plots of the global al- 
gorithm for the given phase space window change as we 
vary A (figure 24). We notice that as we approach the 
boundary of the Mandelbrot set the global algorithm 
reaches better and better accuracy, increasing the pa- 
rameter sensitivity exponent (p.s.e.) 7 = 1 — d for the 
domain. So our experiments suggest that we can max- 
imize the p.s.e. 7 and hence optimize performance of 
the global algorithm by choosing £ so as to minimize the 
distance of c = A + £i from the boundary of Mandelbrot 
set M . 

We remark that we should be careful about the above 
statement since the distance to the Mandelbrot set is not 
the only thing that controls 7: it matters a lot which 
pari of the Mandelbrot set we are close to. Consider for 
example varying A close to the point c = —0.75 + Oi 
(the round top of the main cardioid of the Mandelbrot 
set). The point c = —0.75 is on the boundary of the 
Mandelbrot set, but as we move A along the real axis we 
move inside the Mandelbrot set obtaining a parameter 
sensitivity exponent (p.s.e.) of 7 « 1 — 0.687 = 0.313. A 
much bigger p.s.e. (7 « 1 - 0.218 = 0.782) is obtained 
when we are close to the boundary of M on the vertical 
line A = -1.0. 

We have seen that the boundary of the Mandelbrot 
set is a bifurcation set for the quadratic family, since the 
topology of the Julia set changes as we cross it. We con- 
jecture that for more general systems we can get optimal 
performance out of a global estimation algorithm by tun- 
ing control parameters so as to drive the system near a 
bifurcation set in parameter space. Intuitively we expect 
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Figure 20: Quadratic Family: Plots of log(<5FA,<5A) as 
a function of log||<$A||, 6X > 0, for £ = 0.3, for varius 
strengths of dynamic noise, and fixed observation noise 
with a — 10~ 3 . The curves (from top to bottom) cor- 
respond to dynamic noise a d = 10~ 6 , 10~ 8 , 10~ 10 , 10~ 12 . 
The maximum number of iterations is 100, and the do- 
main is D = {(x,y) : x € [-0.079555, -0.079525], y £ 
[0.265320,0.265350]}. A 512x512 cell resolution is used. 
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Figure 21: Quadratic Family: Plots of log^i^^) as 
a function of log||6A||, 6\ > 0, for £ = 0.3, for varius 
strengths of dynamic noise, and fixed observation noise 
with <r = 10~ 3 . The curves (from top to bottom) cor- 
respond to dynamic noise <r d = 10 -6 , 10~ 8 , 10~ 10 . The 
maximum number of iterations is 100, and the domain 
is D = {(x,y) : x G [-1.445, -1.37], y G [0.05,0.2]}. A 
512 x 512 cell resolution is used. 



to have much more sensitivity to changes in parameters 
(high p.s.e.) as we get close to bifurcations. 

We believe it would be very interesting to develop a 
general theory of optimal global geometric estimation 
through control and in general analyze the dependece of 
the the parameter sensitivity exponent on the position 
in parameter space. 

5.1 A Global Geometric Controller for the 
Quadratic Family 

In this section we will show how to use a result of J. 
Milnor and W. Thurston [25, 27] for bounding the dis- 
tance to the Mandelbrot set M , to obtain a global geo- 
metric controller for the quadratic family. Our method 
for determining the control £ value gives an astonish- 
ingly good performance, improving the initial estimate 
by more than 13 orders of magnitude (figure 26), and 
improving the best estimate that we had for £ = 0.3 by 
more than 7 orders of magnitude for the given domain. 

5.1.1 Bounding the distance to M 

J. Milnor and W. Thurston proved a bound for the 
distance d(c, M) of a point c in the parameter plane 
from the Mandelbrot set M, on which some algorithms 
for producing pictures of the Mandelbrot set are based. 
In particular, they have shown the following result 
(see [27]): 

Theorem 5.1 If c is a point outside of M , then 



(52) 



sinh G{c) 



<d(c,M)< 



2 sinh G(c) 



2expG( C )||G*(c)|| -— - || 6 v (cJ| 

Similar inequalities can be obtained for points inside the 
Mandelbrot set, as well as for the distance of points in 
the dynamical plane from connected Julia sets. 

Let us approximate, somewhat arbitrarily, the dis- 
tance of a point c outside of M , by the estimate of the 
upper bound in inequality 52, i.e. let 



(53) 



d(c,M) 



2 sinh G(c) 

WO'(c)\\ 



For c near M, we may approximate sinhG'(c) by G(c). 
A further approximation to G(c) gives: 



(54) 

where 

(55) 



rf(c,M)«2JMi og || 2n | 



z„ — 



dz n 
dc 
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5.1.2 The Controller 

Using the approximation 54 it is easy to build a pro- 
gram that minimizes the distance to M . First of all in 
order to obtain an estimate for the distance of c from 
M, iterate: 

Z* + l = fc(zk) = Zjfc +c,2 = 0,fc = 0, 1,2,... 

until either ||^+i|| > R (where R is large) or k - N max , 
where N max is the maximum number of iterations that 
we allow (we take N max = 1000). If we stopped at 
k = N max we let d(c, M) = 0. If we have stopped at 




Figure 22: The Mandelbrot Set: D = [-2.45,0.75] x [-1.3, 1.3]. 
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Figure 23: The Mandelbrot Set: D = [-1.1,-0.9] x [0.2,0.4] (magnification of the window shown in figure 22). 
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Figure 24: Plot of log(6F Xi6x ) vs. log||<5A|| for the 
quadratic map with c = A + £i, where F\ is escape time 
out of a circle of radius R = 100, and the cell resolution 
is 512 x 512. The real global function is at A = — 1, and 
is measured with Gaussian observation noise with vari- 
ance a a = 10 -3 and no dynamical noise (cr d = 0). From 
top to bottom, the lines correspond to £ = 0.3, £ = 0.25 
£ = 0.35, <e = 0.2, £ = 0.4. 
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Figure 25: Plot of \og(6F Xt6 \) vs. log||<5A|| for the 
quadratic map with c = A + £i, £ = 0, where F x is 
escape time out of a circle of radius R = 100, and the 
cell resolution is 256 x 256. The real global function is 
at A = -0.75, and the slope is approximately 0.687. 

n < N max , then having saved the orbit {z , . . . , z n } we 
compute: 

4+i = 2**4 + 1, 4 = 0, k = 0, 1, . . . , n - 1 

If we get an overflow in the course of this iteration, then 
this means that c is very close to M and we return 
d(c,M) = — 1. Otherwise, we return 

rf(c,M) = 2J{^J}log||z n || 

A simple program implementing this is shown in fig- 
ure 27. 

The optimal control value £ is the value for which the 
distance estimate d(X + £i, M) is minimized. Figure 28 
shows a listing of the very simple program that achieves 
the value of q which minimizes the estimate 

(MSetDist (make-rectangular p q) maxiter R 
overflow) 

for given values of maxiter, R, overflow. 

We first fix a value of £ and obtain an estimate for 
A to feed the function optimal-control (figure 5.1.2) 
by running a local algorithm (for example an extended 
Kalman filter). The local algorithm (implemented by 
Elmer Hung [17]) that we have used gave an accuracy 
of 10~ 5 - 10~ 6 . The function optimal-control will 
return a value q for £ whose distance from the trully 
optimal value will be of the order of 10~ 5 - 10~ 6 . We 
can then drive £ to the estimate q and run the global 




Figure 26: Plot of \og(6F x ,sx) vs. log||6A|| for the 
quadratic map with c = A + £z, where F\ is escape time 
out of a circle of radius R = 100, and the cell resolution 
is 512 x 512. The real global function is at A = — 1, and 
is measured with Gaussian observation noise with vari- 
ance a = 10 -3 and no dynamical noise (cr^ = 0). From 
top to bottom, the lines correspond to £ = 0.28651237 
(d « 0.218) £ = 0.3 (d « 0.35), £ = 0.25 (d « 0.48), 
£ = 0.35 (d « 0.59), £ = 0.2 (d « 0.58), £ = 0.4 
(d«0.68). 
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(define (HSetDist c maxiter R overflow) 

(let ((zorbit (make-vector (+ max- 
iter 1) 0.0))) 

(define (orbit-loop i z) 
(cond ((> i maxiter) 
0.0) 
((> (magnitude z) R) 

(der-loop 1 i 0.0+0.0i)) 
(else 
(let ((newz (+ (* z z) c))) 
(vector-set! zorbit i newz) 
(orbit-loop (+ i 1) newz))))) 
(define (der-loop i iter zder) 

(let ((z (vector-ref zorbit (- i 1)))) 
(cond ((> (magnitude zder) overflow) 
-1) 

((= i iter) 
(let ((m (magnitude z))) 
(/ (* 2 (* m (log m))) 
(magnitude zder)))) 
(else 
(der- 
loop (+ i 1) iter (+ (* 2 (* z zder)) 1)))))) 
(orbit-loop 0.0+0.0i))) 

Figure 27: A program estimating the distance of a point 
c in the parameter plane from M . 



algorithm. After having obtained a better estimate for p 
(= A) from the global algorithm, we may feed that esti- 
mate into optimal-control to get a better estimate for 
the optimal control value and repeat the whole process, 
until we get satisfactory accuracy. 

6 Cooperation between local and global 
approaches 

We know that the dissimilarity function is locally uni- 
modal but we have to have a way of entering the region 
of unimodality in order for the global algorithm to work. 
One approach is to look at the domain D at different 
cell resolutions. In other words increase the number of 
cells used to represent the global function. As we get to 
coarser and coarser resolutions, the region of unimodal- 
ity starts at bigger and bigger values. A more interesting 
solution to this problem is to use a combination of lo- 
cal and global approaches. In particular, we can use the 
local algorithms to obtain an estimate of the parameter 
that places us in a region of unimodality of the dissimi- 
larity function, and then use a global approach to zero in 
to the correct parameter value. Numerical experiments 
seem to indicate that local methods we have tried, reach 
rather quickly an good estimate of the parameter but 
then get stuck and do not seem to improve further (prob- 
ably because of numerical problems, invalid linearization 
assumptions, or bad finite approximation assumptions). 
The global algorithm, given the estimate obtained by 
the local method, increases it by many orders of mag- 
nitude. It seems that a cooperation between local and 
global approaches is the right way to attack real estima- 
tion problems. 



(define (optimal-control p tol ql q2) 
(define (loop ql q2 dl d2 
(let* ((q (/ (+ ql q2) 2)) 

(c (make-rectangular p q) ) 
(d (msetdist c 1000 1000 lelOO))) 
(cond ((< dl tol) 

qi) 

((= d 0.0) 
(loop ql q 

dl d)) 
(else 
(loop q q2 

d d2))))) 
(loop ql q2 

(msetdist (make- 
rectangular p ql) 1000 1000 lelOO) 

(msetdist (make- 
rectangular p q2) 1000 1000 lelOO))) 

Figure 28: A program returning an estimate of an opti- 
mal control value £, given an estimate p for A and two 
points ql not in M , and q2 inside M . 



7 Final Comments 

In this paper, we propose a new approach to parame- 
ter estimation based on exploiting the global geometri- 
cal complexity of nonlinear dynamical systems, rather 
than trying to do local approximations as the classical 
algorithms do. 

We demonstrate the power of a global approach in the 
context of complex analytic dynamics. Under very rea- 
sonable magnification and noise assumptions, and with a 
careful combination of global estimation and control, we 
reach an improvement as big as 13 orders of magnitude 
over the initial estimate. 

The approach which we follow in the case of complex 
analytic dynamics can be extended to much more general 
settings. 

We remark that the choice for global functions that we 
have made is just one of the many possibilities. We have 
just given one implementation of the much more 
general idea of global parameter estimation. In 
different settings, we are forced to chose different global 
functions or minimization algorithms. For example, in 
the case of Hamiltonian dynamical systems no attractors 
exist and completely different global functions must be 
devised (for example functions that reflect the shape of 
chaotic layers). 

The global approach is computationally much more 
demanding than the local approaches but can be much 
more accurate and can have more robust convergence 
properties. With the use of increasingly faster computers 
the disadvantage mentioned above becomes less and less 
important. Moreover, the global approach is ideally 
suited for parallel computation, opening the door 
to tremendous icrease in performance. 

There are many new directions to follow: first of all the 
power law that the dissimilarity function locally obeys 
needs to be understood thoroughly and analyzed the- 
oretically. Many interesting open questions concerning 
the nature and behavior of the parameter sensitivity ex- 



28 



ponent 7 also arise. To our knowledge these results are 
completely new. 

Moreover, we believe that it would be interesting to 
develop a general theory of optimal global geometric pa- 
rameter estimation through control, and investigate how 
p.s.e. changes with the position of the parameter in pa- 
rameter space. 

The ultimate goal is to use chaos and instability and 
combine the local and global approaches to parameter 
estimation in order to obtain breakthrough extraordi- 
narily precise measurements of quantities that are very 
difficult to measure, such as the Universal Constant of 
Gravity. 
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Appendix 

A The Golden Section Method 

The Golden Section Method uses the fact that we can 
bracket the location of the minimum of a unimodal func- 
tion by evaluating the function at two distinct points in 
the region L of unimodality. 

To describe how it works, we first assume that the 
function <p\ is unimodal in the interval L\ — \} x , r{\. Sup- 
pose we evaluate the function at two points x\,x 2 in L 
such that xi < x 2 and find that /(a?i) < f(x 2 ). It follows 
from the definition of unimodality that A £ [/ 1} x 2 ]. Simi- 
larly, if f(xi) > f(x 2 ), then we must have A £ [x ly n]. If 
the function values at 2:1,2:2 are equal then A £ [xi,ar 2 ], 
but for simplicity we may again consider that it belongs 
to any one of the above bigger intervals. In any case, af- 
ter the first two function evaluations, a portion of L\ to 
the right of 2; 2 or to the left of x\ can be eliminated from 
further search. If L 2 = [l 2 ,r 2 ] is the remaining interval, 
we can obtain two more function evaluations and fur- 
ther reduce the length of the interval containing A. By 
using this procedure we can keep reducing the unimodal- 
ity interval, obtaining an increasingly tighter bracketing 
of the minimum value. 

An improved version of the above naive algorithm is 
the Fibonacci method, which gets its name from using 
the Fibonacci sequence 

(56) 

^o = 0, T x = 1, F k =fk-i+fk-2, * = 2,3,... 
in picking the points at which the function is evaluated. 
The method works as follows: Let N be the total number 
of points at which the function will be evaluated. Sup- 
pose that at iteration k, the interval containing A (the 



local minimum) is [4, r k ]. For k = 1, 2, . . . , N - 1, the 
function values are computed at the two points 

?N-k 



(57) 
(58) 



:{ = '* + 



x\ = l k + 



^N+2-k 
^N+l-k 



in-h) 

(r k - h) 



^N+2-k 

We notice that (by the definition of the Fibonacci se- 
quence) one of the points x\,x\ is the same as one of 
the points at a previous iteration. Hence, only one new 
function evaluation is required at each point. This is ex- 
tremely important in our case where function evaluations 
are computationally very expensive. 

One of the disadvantages of the Fibonacci method 
is that the number of function evaluations N must be 
known in advance. Getting rid of this requirement leads 
to a method known as the Golden section method, which 
is a good approximation to the Fibonacci search. It can 
be shown that 



(59) 



lim 



Tn- 



1 Vb- 1 



N- 



r, 



N 



2 



The golden section method then places the points at 
which the function is to be evaluated at 

1 



(60) 
(61) 



*i = h + 



x k 2 = h + -(rjb 






Again, only one function evaluation is required. 
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